NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Modular Vision Language Navigation and Manipulation Framework for Long Horizon Compositional Tasks in Indoor Environment

https://doi.org/10.3389/frobt.2022.930486

Saha, Homagni; Fotouhi, Fateme; Liu, Qisai; Sarkar, Soumik (July 2022, Frontiers in Robotics and AI)

In this paper we propose a new framework—MoViLan (Modular Vision and Language) for execution of visually grounded natural language instructions for day to day indoor household tasks. While several data-driven, end-to-end learning frameworks have been proposed for targeted navigation tasks based on the vision and language modalities, performance on recent benchmark data sets revealed the gap in developing comprehensive techniques for long horizon, compositional tasks (involving manipulation and navigation) with diverse object categories, realistic instructions and visual scenarios with non reversible state changes. We propose a modular approach to deal with the combined navigation and object interaction problem without the need for strictly aligned vision and language training data (e.g., in the form of expert demonstrated trajectories). Such an approach is a significant departure from the traditional end-to-end techniques in this space and allows for a more tractable training process with separate vision and language data sets. Specifically, we propose a novel geometry-aware mapping technique for cluttered indoor environments, and a language understanding model generalized for household instruction following. We demonstrate a significant increase in success rates for long horizon, compositional tasks over recent works on the recently released benchmark data set -ALFRED.
more » « less
Full Text Available
Data-Driven Performance Monitoring of Dynamical Systems Using Granger Causal Graphical Models

https://doi.org/10.1115/1.4046673

Saha, Homagni; Liu, Chao; Jiang, Zhanhong; Sarkar, Soumik (August 2020, Journal of Dynamic Systems, Measurement, and Control)

Abstract Data-driven analysis and monitoring of complex dynamical systems have been gaining popularity due to various reasons like ubiquitous sensing and advanced computation capabilities. A key rationale is that such systems inherently have high dimensionality and feature complex subsystem interactions due to which majority of the first-principle based methods become insufficient. We explore the family of a recently proposed probabilistic graphical modeling technique, called spatiotemporal pattern network (STPN) in order to capture the Granger causal relationships among observations in a dynamical system. We also show that this technique can be used for anomaly detection and root-cause analysis for real-life dynamical systems. In this context, we introduce the notion of Granger-STPN (G-STPN) inspired by the notion of Granger causality and introduce a new nonparametric technique to detect causality among dynamical systems observations. We experimentally validate our framework for detecting anomalies and analyzing root causes in a robotic arm platform and obtain superior results compared to when other causality metrics were used in previous frameworks.
more » « less
Full Text Available
Battery-Free Camera Occupancy Detection System

https://doi.org/10.1145/3469116.3470013

Saffari, Ali; Tan, Sin Yong; Katanbaf, Mohamad; Saha, Homagni; Smith, Joshua R.; Sarkar, Soumik (June 2021, EMDL 2021: 5th International Workshop on Embedded and Mobile Deep Learning)

Occupancy detection systems are commonly equipped with high quality cameras and a processor with high computational power to run detection algorithms. This paper presents a human occupancy detection system that uses battery-free cameras and a deep learning model implemented on a low-cost hub to detect human presence. Our low-resolution camera harvests energy from ambient light and transmits data to the hub using backscatter communication. We implement the state-of-the-art YOLOv5 network detection algorithm that offers high detection accuracy and fast inferencing speed on a Raspberry Pi 4 Model B. We achieve an inferencing speed of ∼100ms per image and an overall detection accuracy of >90% with only 2GB CPU RAM on the Raspberry Pi. In the experimental results, we also demonstrate that the detection is robust to noise, illuminance, occlusion, and angle of depression.
more » « less
Full Text Available

Search for: All records